<!DOCTYPE html>
<html>
<head>
<link href='http://fonts.googleapis.com/css?family=Ropa+Sans' rel='stylesheet' type='text/css'>
<style type="text/css">
	body {
		background-color: #444;
		color: #FFF;
		font-size: 2em;
		font-family: 'Ropa Sans', cursive;
		background: -webkit-radial-gradient(#010101, #003030);
		background: -moz-radial-gradient(#010101, #003030);
		background: -o-radial-gradient(#010101, #003030);
		background: radial-gradient(#010101, #003030);
		-webkit-font-smoothing: antialiased;
	}
	.step { width: 1024px; }
	.step:not(.active) { opacity: 0.3; }
	.substep { opacity: 0.1; }
	.substep.active { opacity: 1; }
	.substep.previous { opacity: 0.7; }
	img { margin-left: 450px; }
	.no-support-message { display: none; }
	.impress-not-supported .no-support-message {
		display: block;
		color: red;
		font-size: 2em;
	}
	A:link    { text-decoration: none; color: green }
	A:visited { text-decoration: none; color: yellow  }
	A:active  { text-decoration: none; color: yellow  }
	A:hover   { text-decoration: underline; color: red }
</style>		
</head>
<body>
<div id=impress>

<div class=step>
	<h1>DFG mapping technique</h1>
	code generation for multiple connected Forth cores<br>
	<a href="http://goo.gl/u17KA">goo.gl/u17KA</a>
</div>

<div class=step data-y=800>
<h2>Data Flow Graph?</h2>
<img type="image/svg+xml" src="radix8_2/00.svg" allign="right" style="float: right; margin-left: 10px;"/>
Program as directed acyclic graph of data flowing between operations.
<p>Vertex = operation type</p>
<p>Edge = data value</p>
<p>Label = multiplicand</p>
<p>FFT8 as DSP example</p>
</div>

<div class=step data-y=1600>
<h2>Vertex shape = operation type</h2>
<img type="image/svg+xml" src="radix8_2/02.svg" allign="right" style="float: right" width="100%"/>
<p>.</p><p>Ellipse = (+ -), Rectangle = [*], Arrow = &lt;input output&gt;</p>
<p>Example has significant data parallelism. 17 operations at first level.</p>
</div>

<div class=step data-y=2400>
<h2>Mapping approach</h2>
<ol>
<li class=substep>Uniform k-way graph partitioning; k = number of cores in use</li>
<li class=substep>Mapping to interconnect topology; one sub-graph per core</li>
<li class=substep>Instruction scheduling; Stack allocation</li>
<li class=substep>Code generation</li>
</ol>
</div>

<div class=step data-y=2400 data-x=3000 data-rotate=-90 data-scale=4>
<h2>2-way graph partitioning (example)</h2>
<li class=substep>Sub-graphs are of about the same size</li>
<li class=substep>Number of edges running between separated components is small</li>
<p></p>
<img type="image/svg+xml" src="radix8_2/03.svg" allign="right" style="float: right" width="100%"/>
<i class=substep>NP-hard problem. Many methods known</i>
</div>

<div class=step data-y=-2400 data-x=3000 data-rotate=-90 data-scale=4>
<h2>6-way graph partitioning example</h2>
<li class=substep>More colors - more edges crossing</li>
<li class=substep>Hard to partition uniformly</li>
<p></p>
<img type="image/svg+xml" src="radix8_6/03.svg" allign="right" style="float: right" width="100%"/>
</div>

<div class=step data-y=2400 data-x=6000 data-rotate=-90 data-scale=4>
<h2>Color->Color edges = inter-processor exchange</h2>
<img type="image/svg+xml" src="radix8_2/04.svg" allign="right" style="float: right" width="100%"/>
</div>

<div class=step data-y=-2400 data-x=6000 data-rotate=-90 data-scale=4>
<h2>Hard to see example with 6 partitions</h2>
<img type="image/svg+xml" src="radix8_6/04.svg" allign="right" style="float: right" width="100%"/>
</div>

<div class=step data-y=3200>
<h2>Interconnect topology</h2>
<img type="image/svg+xml" src="radix8_6/map.svg" allign="right" style="float: right; margin-left: 10px;"/>
<li>Core interactions to be mapped to existing interconnect topology</li>
<li>Interconnect will have capacity, throughput and latency</li>
<li>Each exchange is the source of data dependency</li>
<li>Data exchanges will be spreaded in time</li>
</div>

<div class=step data-y=4000>
<h2>Sub-graph serialization</h2>
<li>Operations in each node has to be executed sequentially.</li>
<li>Simple expression evaluation is easy with stack machine.</li>
<li>But DFG is not always the tree.</li>
</div>

<div class=step data-y=2400 data-x=-10000 data-rotate=90 data-scale=4>
<h2>2-way graph serialization (example)</h2>
<img type="image/svg+xml" src="radix8_2/06.svg" allign="right" style="float: right" width="100%"/>
</div>

<div class=step data-y=2400 data-x=-20000 data-rotate=90 data-scale=4>
<h2>6-way graph serialization (example)</h2>
<img type="image/svg+xml" src="radix8_6/06.svg" allign="right" style="float: right" width="100%"/>
</div>

<div class=step data-y=2400 data-x=-25000 data-rotate=90 data-scale=4>
<img type="image/svg+xml" src="radix8_6/code.png" allign="right" style="float: right" width="100%"/>
</div>

<div class=step data-y=4800>
<h2>List based Graph to Stack scheduling</h2>
<li class=substep>List of all operations is given in the initial priority order</li>
<li class=substep>Each operation becomes ready for execution when its operands are available on stack</li>
<li class=substep>Operand's availability checked by ordered set of heuristics </li>
<li class=substep>Backtracking algorithm used to return from dead branches </li>
<li class=substep>Forth code generated as output </li>
</div>

<div class=step data-y=5600>
<h2>Conclutions</h2>
<li class=substep>Scheduler may fail because of limited set of heuristics</li>
<li class=substep>Big set of heuristics may lead to extra stack manipulations</li>
<li class=substep>Parallel scheduling of all nodes will help resolving data dependencies between nodes</li>
<li class=substep>Extra code optimizations may be required</li>
<li class=substep>Set of heuristics depend on target ISA</li>
</div>

</div>

<script type="text/javascript" src="impress.sub.js"></script>
</body>
</html>
